Model Selection

Image-Text Interaction

# Image-Text Interaction

Gemma 3 4B It Qat GGUF

The Gemma 3 4B IT model by Google supports multimodal input and long-context processing, suitable for text generation and image understanding tasks.

lmstudio-community

Gemma 3 27b It Int4 Gguf

Gemma 3 is a lightweight cutting-edge open model family from Google, built on the same research technology as Gemini models. Supports text/image input and text output, offering both pretrained and instruction-tuned weight versions.

BLIP is a unified vision-language pretraining framework, excelling in visual question answering tasks through joint language-image training to achieve multimodal understanding and generation capabilities

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase